{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 两独立样本比较的秩和检验（2）\n",
    "\n",
    "## 案例\n",
    "\n",
    "探究健康人与慢性气管炎病人痰液嗜酸性粒细胞数有无差别\n",
    "\n",
    "### 数据\n",
    "\n",
    "数据库数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (68, 3)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>id</th><th>eosinophil cell count</th><th>group</th></tr><tr><td>i64</td><td>i64</td><td>str</td></tr></thead><tbody><tr><td>1</td><td>0</td><td>&quot;control&quot;</td></tr><tr><td>2</td><td>0</td><td>&quot;control&quot;</td></tr><tr><td>3</td><td>0</td><td>&quot;control&quot;</td></tr><tr><td>4</td><td>0</td><td>&quot;control&quot;</td></tr><tr><td>5</td><td>0</td><td>&quot;control&quot;</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>64</td><td>3</td><td>&quot;control&quot;</td></tr><tr><td>65</td><td>3</td><td>&quot;control&quot;</td></tr><tr><td>66</td><td>3</td><td>&quot;control&quot;</td></tr><tr><td>67</td><td>3</td><td>&quot;control&quot;</td></tr><tr><td>68</td><td>3</td><td>&quot;control&quot;</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (68, 3)\n",
       "┌─────┬───────────────────────┬─────────┐\n",
       "│ id  ┆ eosinophil cell count ┆ group   │\n",
       "│ --- ┆ ---                   ┆ ---     │\n",
       "│ i64 ┆ i64                   ┆ str     │\n",
       "╞═════╪═══════════════════════╪═════════╡\n",
       "│ 1   ┆ 0                     ┆ control │\n",
       "│ 2   ┆ 0                     ┆ control │\n",
       "│ 3   ┆ 0                     ┆ control │\n",
       "│ 4   ┆ 0                     ┆ control │\n",
       "│ 5   ┆ 0                     ┆ control │\n",
       "│ …   ┆ …                     ┆ …       │\n",
       "│ 64  ┆ 3                     ┆ control │\n",
       "│ 65  ┆ 3                     ┆ control │\n",
       "│ 66  ┆ 3                     ┆ control │\n",
       "│ 67  ┆ 3                     ┆ control │\n",
       "│ 68  ┆ 3                     ┆ control │\n",
       "└─────┴───────────────────────┴─────────┘"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import polars as pl\n",
    "\n",
    "df_raw = pl.read_csv(\"B_10_5-data.csv\")\n",
    "\n",
    "df_raw"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "整理后的数据表如下"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (4, 3)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>eosinophil cell count</th><th>patient</th><th>control</th></tr><tr><td>i64</td><td>i32</td><td>i32</td></tr></thead><tbody><tr><td>0</td><td>11</td><td>5</td></tr><tr><td>1</td><td>10</td><td>18</td></tr><tr><td>2</td><td>3</td><td>16</td></tr><tr><td>3</td><td>0</td><td>5</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (4, 3)\n",
       "┌───────────────────────┬─────────┬─────────┐\n",
       "│ eosinophil cell count ┆ patient ┆ control │\n",
       "│ ---                   ┆ ---     ┆ ---     │\n",
       "│ i64                   ┆ i32     ┆ i32     │\n",
       "╞═══════════════════════╪═════════╪═════════╡\n",
       "│ 0                     ┆ 11      ┆ 5       │\n",
       "│ 1                     ┆ 10      ┆ 18      │\n",
       "│ 2                     ┆ 3       ┆ 16      │\n",
       "│ 3                     ┆ 0       ┆ 5       │\n",
       "└───────────────────────┴─────────┴─────────┘"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "count_groups = df_raw.select(\"eosinophil cell count\").unique().sort(\"eosinophil cell count\")\n",
    "individual_groups = df_raw.select(\"group\").unique()\n",
    "\n",
    "# individual group is either \"control\" or \"patient\"\n",
    "\n",
    "df = count_groups.with_columns(\n",
    "    *[pl.lit(0).alias(i) for i in individual_groups.to_series().to_list()]\n",
    ")\n",
    "\n",
    "for row in df_raw.iter_rows(named=True):\n",
    "    for group in individual_groups.to_series().to_list():\n",
    "        if row[\"group\"] == group:\n",
    "            df = df.with_columns(\n",
    "                pl.when(pl.col(\"eosinophil cell count\") == row[\"eosinophil cell count\"])\n",
    "                .then(pl.col(group) + 1)\n",
    "                .otherwise(pl.col(group))\n",
    "                .alias(group)\n",
    "            )\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "影响变量为二项分类变量，结果变量为多项有序分类变量，且两样本相互独立，采用 Wilcoxon rank-sum test 检验两样本所代表的总体的中位数是否相同。\n",
    "\n",
    "### 假设\n",
    "\n",
    "$ H_0 $: 两个样本所代表的总体的中位数相同。  \n",
    "$ H_1 $: 两个样本所代表的总体的中位数不相同。\n",
    "\n",
    "### 假设检验"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Statistic: 3.432874328144372\n",
      "P-value: 0.0005972188465239742\n",
      "Significance: True\n"
     ]
    }
   ],
   "source": [
    "from scipy.stats import ranksums\n",
    "\n",
    "res = ranksums(\n",
    "    df_raw.filter(pl.col(\"group\") == \"control\").select(pl.col(\"eosinophil cell count\")),\n",
    "    df_raw.filter(pl.col(\"group\") == \"patient\").select(pl.col(\"eosinophil cell count\"))\n",
    ")\n",
    "\n",
    "print(f\"\"\"Statistic: {float(res.statistic.sum())}\n",
    "P-value: {float(res.pvalue.sum())}\n",
    "Significance: {res.pvalue.sum() < 0.05}\"\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "P < 0.05, 在 $ \\alpha = 0.05 $ 的检验水准下，拒绝 $ H_0 $，接受 $ H_1 $，可以认为健康人和慢性气管炎兵分痰液嗜酸性粒细胞数量不同。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
